Rutherford
FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
Wan, Yingjia, Tan, Haochen, Zhu, Xiao, Zhou, Xinyu, Li, Zhiwei, Lv, Qingsong, Sun, Changxuan, Zeng, Jiaqi, Xu, Yi, Lu, Jianqiao, Liu, Yinhong, Guo, Zhijiang
Evaluating the factuality of long-form generations from Large Language Models (LLMs) remains challenging due to efficiency bottlenecks and reliability concerns. Prior efforts attempt this by decomposing text into claims, searching for evidence, and verifying claims, but suffer from critical drawbacks: (1) inefficiency due to overcomplicated pipeline components, and (2) ineffectiveness stemming from inaccurate claim sets and insufficient evidence. To address these limitations, we propose \textbf{FaStfact}, an evaluation framework that achieves the highest alignment with human evaluation and time/token efficiency among existing baselines. FaStfact first employs chunk-level claim extraction integrated with confidence-based pre-verification, significantly reducing the time and token cost while ensuring reliability. For searching and verification, it collects document-level evidence from crawled web-pages and selectively retrieves it during verification. Extensive experiments based on an annotated benchmark \textbf{FaStfact-Bench} demonstrate the reliability of FaStfact in both efficiently and effectively evaluating long-form factuality. Code, benchmark data, and annotation interface tool are available at https://github.com/Yingjia-Wan/FaStfact.
- North America > United States > New Jersey > Bergen County > Rutherford (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Austria > Vienna (0.14)
- (26 more...)
- Telecommunications (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Information Technology (1.00)
- (4 more...)
Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks
Semenov, Kirill, Sennrich, Rico
For multilingual factual knowledge assessment of LLMs, benchmarks such as MLAMA use template translations that do not take into account the grammatical and semantic information of the named entities inserted in the sentence. This leads to numerous instances of ungrammaticality or wrong wording of the final prompts, which complicates the interpretation of scores, especially for languages that have a rich morphological inventory. In this work, we sample 4 Slavic languages from the MLAMA dataset and compare the knowledge retrieval scores between the initial (templated) MLAMA dataset and its sentence-level translations made by Google Translate and ChatGPT. We observe a significant increase in knowledge retrieval scores, and provide a qualitative analysis for possible reasons behind it. We also make an additional analysis of 5 more languages from different families and see similar patterns. Therefore, we encourage the community to control the grammaticality of highly multilingual datasets for higher and more interpretable results, which is well approximated by whole sentence translation with neural MT or LLM systems. The dataset and all related code is published at the Github repository: https://github.com/ZurichNLP/Fluent-mLAMA.
Steelers' courtship of Aaron Rodgers is more 'complex' than artificial intelligence, part-owner says
Emmanuel Acho, LeSean McCoy and James Jones discuss whether the Pittsburgh Steelers should draft a QB in the first round with Aaron Rodgers' NFL future unknown. The calendar has turned to May, and Aaron Rodgers is still a free agent. Rodgers has been linked to the Steelers for a couple of months, but Thomas Tull, a part-owner of the Steelers, said the courtship of Rodgers is more "complex" than artificial intelligence. "I'm here to talk about AI, and that's a more complex issue than artificial intelligence," Tull said when asked about Rodgers in an interview on CNBC's "Power Lunch." The team has three quarterbacks on its roster -- Mason Rudolph, Skylar Thompson and sixth-round draft pick Will Howard.
- North America > United States > New York (0.30)
- North America > United States > New Jersey > Bergen County > Rutherford (0.08)
- North America > United States > California > Santa Clara County > Santa Clara (0.06)
- (2 more...)
Aaron Rodgers spotted strolling on beach while NFL awaits free agent decision
Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. The New York Jets officially made four-time MVP quarterback Aaron Rodgers a free agent when the new league year began on Wednesday at 4 p.m. However, Rodgers has had the ability to talk to different teams to find his new home in the NFL for the 2025 season, which has led to reports and speculation from multiple fan bases about where he will end up. While the free agency whirlwind has been going on, it appears Rodgers is at peace, enjoying time by himself soaking in the sun.
- North America > United States > New York (0.68)
- North America > United States > New Jersey > Bergen County > Rutherford (0.07)
- North America > United States > California > Los Angeles County > Los Angeles (0.07)
- (2 more...)
Minds versus Machines: Rethinking Entailment Verification with Language Models
Sanyal, Soumya, Xiao, Tianyi, Liu, Jiacheng, Wang, Wenya, Ren, Xiang
Humans make numerous inferences in text comprehension to understand discourse. This paper aims to understand the commonalities and disparities in the inference judgments between humans and state-of-the-art Large Language Models (LLMs). Leveraging a comprehensively curated entailment verification benchmark, we evaluate both human and LLM performance across various reasoning categories. Our benchmark includes datasets from three categories (NLI, contextual QA, and rationales) that include multi-sentence premises and different knowledge types, thereby evaluating the inference capabilities in complex reasoning instances. Notably, our findings reveal LLMs' superiority in multi-hop reasoning across extended contexts, while humans excel in tasks necessitating simple deductive reasoning. Leveraging these insights, we introduce a fine-tuned Flan-T5 model that outperforms GPT-3.5 and rivals with GPT-4, offering a robust open-source solution for entailment verification. As a practical application, we showcase the efficacy of our finetuned model in enhancing self-consistency in model-generated explanations, resulting in a 6% performance boost on average across three multiple-choice question-answering datasets.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > North Korea (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (14 more...)
- Leisure & Entertainment (1.00)
- Government > Regional Government > North America Government > United States Government (0.92)
- Media > Film (0.67)
- Education > Educational Setting (0.67)
AI fakes are everywhere -- how to spot them
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Tom Hanks isn't trying to sell you a dental plan. I knew this stage of AI tomfoolery was coming, but it's still surprising how fast it's happening. Let's take a closer look at how free and cheap tools are fueling fraud -- and the signs to watch for.
- North America > United States > New Jersey > Bergen County > Rutherford (0.05)
- Europe > France > Île-de-France > Paris > Paris (0.05)
- Media > News (0.52)
- Information Technology > Security & Privacy (0.51)
- Information Technology > Communications > Mobile (0.43)
- Information Technology > Artificial Intelligence > Machine Learning (0.31)
Gayle King fumes over manipulated AI video of her endorsing weight loss company: 'Don't be fooled'
Fake AI pictures and videos will be nearly impossible to discern from real images as the technology behind deepfakes advances, University of California, Berkeley professor says. American television personality Gayle King has warned her followers about the dangers of artificial intelligence (AI) after she became the victim of a manipulated video. A video of King has circulated on Instagram in which she appeared to promote various weight loss products from a company known as Artipet. The sponsored post appeared on the feed of many of the "CBS Mornings'" host's one million followers. "Ladies, honestly, I did not expect my weight loss to spark so many questions. My direct messages on Instagram are overflowing," King can be heard saying in the video.
- North America > United States > California > Alameda County > Berkeley (0.25)
- North America > United States > New Jersey > Bergen County > Rutherford (0.05)
Cowboys roll out AI-powered version of Jerry Jones inside AT&T Stadium to take on fan questions
Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. Dallas Cowboys owner Jerry Jones is one of the most recognizable figures in the entire NFL. Football fans would probably jump at the opportunity to ask Jones, who also serves as the team's general manager, questions if they ever had the chance to meet him. But now, artificial intelligence gives anyone who visits AT&T Stadium the chance to ask Jones questions.
- North America > United States > New York (0.06)
- North America > United States > New Jersey > Bergen County > Rutherford (0.06)
- North America > United States > Arkansas (0.06)
Joe Burrow puts up video game numbers in first half
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. The defending AFC champions heard the noise, so the Cincinnati Bengals reminded people why they made it to the Super Bowl last year. If you saw Joe Burrow's stat sheet from the first half against the Atlanta Falcons, you would be impressed with it through the entirety of four quarters. The Bengals defeated the Jets 27-12. In the first 30 minutes of the ballgame, he threw for 345 yards on 21-of-25 passing, three of them touchdowns.
- North America > United States > New York (0.22)
- North America > United States > Ohio > Hamilton County > Cincinnati (0.07)
- North America > United States > New Jersey > Bergen County > Rutherford (0.07)
Atos Artificial Intelligence to power American Dream attractions
Atos, a global leader in digital transformation, today announced the delivery of its high-powered artificial intelligence, Codex AI Suite, to analyze data and predict attraction performance for Triple Five Group's American Dream retail and entertainment complex. This modern IoT and machine learning platform will enable American Dream to reduce downtime, increase guest satisfaction and lower maintenance costs. Atos' Codex AI Suite collects and stores data from hundreds of ride sensors, which feeds into an algorithm that detects apparent trends, anomalies and unique identifiers of a machine's state. It is an easy-to-use, efficient and cost-effective solution to help American Dream rapidly build and deploy artificial intelligence applications, better extract value from data and develop new business opportunities. The solution will establish a baseline of data analytics using historical data captured from the ride controller and activated through apps rendering on any device.
- North America > United States > Texas > Dallas County > Irving (0.05)
- North America > United States > New Jersey > Bergen County > Rutherford (0.05)
- Europe > France > Île-de-France > Paris > Paris (0.05)
- Information Technology (0.53)
- Leisure & Entertainment > Sports > Olympic Games (0.32)